3 research outputs found
Adaptive Pattern Extraction Multi-Task Learning for Multi-Step Conversion Estimations
Multi-task learning (MTL) has been successfully used in many real-world
applications, which aims to simultaneously solve multiple tasks with a single
model. The general idea of multi-task learning is designing kinds of global
parameter sharing mechanism and task-specific feature extractor to improve the
performance of all tasks. However, challenge still remains in balancing the
trade-off of various tasks since model performance is sensitive to the
relationships between them. Less correlated or even conflict tasks will
deteriorate the performance by introducing unhelpful or negative information.
Therefore, it is important to efficiently exploit and learn fine-grained
feature representation corresponding to each task. In this paper, we propose an
Adaptive Pattern Extraction Multi-task (APEM) framework, which is adaptive and
flexible for large-scale industrial application. APEM is able to fully utilize
the feature information by learning the interactions between the input feature
fields and extracted corresponding tasks-specific information. We first
introduce a DeepAuto Group Transformer module to automatically and efficiently
enhance the feature expressivity with a modified set attention mechanism and a
Squeeze-and-Excitation operation. Second, explicit Pattern Selector is
introduced to further enable selectively feature representation learning by
adaptive task-indicator vectors. Empirical evaluations show that APEM
outperforms the state-of-the-art MTL methods on public and real-world financial
services datasets. More importantly, we explore the online performance of APEM
in a real industrial-level recommendation scenario.Comment: 18 pages, 9 figure
MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for Video Summarization
Video summarization intends to produce a concise video summary by effectively
capturing and combining the most informative parts of the whole content.
Existing approaches for video summarization regard the task as a frame-wise
keyframe selection problem and generally construct the frame-wise
representation by combining the long-range temporal dependency with the
unimodal or bimodal information. However, the optimal video summaries need to
reflect the most valuable keyframe with its own information, and one with
semantic power of the whole content. Thus, it is critical to construct a more
powerful and robust frame-wise representation and predict the frame-level
importance score in a fair and comprehensive manner. To tackle the above
issues, we propose a multimodal hierarchical shot-aware convolutional network,
denoted as MHSCNet, to enhance the frame-wise representation via combining the
comprehensive available multimodal information. Specifically, we design a
hierarchical ShotConv network to incorporate the adaptive shot-aware
frame-level representation by considering the short-range and long-range
temporal dependency. Based on the learned shot-aware representations, MHSCNet
can predict the frame-level importance score in the local and global view of
the video. Extensive experiments on two standard video summarization datasets
demonstrate that our proposed method consistently outperforms state-of-the-art
baselines. Source code will be made publicly available
Poincar\'{e} Heterogeneous Graph Neural Networks for Sequential Recommendation
Sequential recommendation (SR) learns users' preferences by capturing the
sequential patterns from users' behaviors evolution. As discussed in many
works, user-item interactions of SR generally present the intrinsic power-law
distribution, which can be ascended to hierarchy-like structures. Previous
methods usually handle such hierarchical information by making user-item
sectionalization empirically under Euclidean space, which may cause distortion
of user-item representation in real online scenarios. In this paper, we propose
a Poincar\'{e}-based heterogeneous graph neural network named PHGR to model the
sequential pattern information as well as hierarchical information contained in
the data of SR scenarios simultaneously. Specifically, for the purpose of
explicitly capturing the hierarchical information, we first construct a
weighted user-item heterogeneous graph by aliening all the user-item
interactions to improve the perception domain of each user from a global view.
Then the output of the global representation would be used to complement the
local directed item-item homogeneous graph convolution. By defining a novel
hyperbolic inner product operator, the global and local graph representation
learning are directly conducted in Poincar\'{e} ball instead of commonly used
projection operation between Poincar\'{e} ball and Euclidean space, which could
alleviate the cumulative error issue of general bidirectional translation
process. Moreover, for the purpose of explicitly capturing the sequential
dependency information, we design two types of temporal attention operations
under Poincar\'{e} ball space. Empirical evaluations on datasets from the
public and financial industry show that PHGR outperforms several comparison
methods.Comment: 32 pages, 12 figuew